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Abstract 

The early detection of disease epidemics reduces the chance of successful introductions into new locales, minimizes the 
number of infections, and reduces the financial impact. We develop a framework to determine the optimal sampling 
strategy for disease detection in zoonotic host-vector epidemiological systems when a disease goes from below detectable 
levels to an epidemic. We find that if the time of disease introduction is known then the optimal sampling strategy can 
switch abruptly between sampling only from the vector population to sampling only from the host population. We also 
construct time-independent optimal sampling strategies when conducting periodic sampling that can involve sampling 
both the host and the vector populations simultaneously. Both time-dependent and -independent solutions can be useful 
for sampling design, depending on whether the time of introduction of the disease is known or not. We illustrate the 
approach with West Nile virus, a globally-spreading zoonotic arbovirus. Though our analytical results are based on a 
linearization of the dynamical systems, the sampling rules appear robust over a wide range of parameter space when 
compared to nonlinear simulation models. Our results suggest some simple rules that can be used by practitioners when 
developing surveillance programs. These rules require knowledge of transition rates between epidemiological 
compartments, which population was initially infected, and of the cost per sample for serological tests. 
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This is a PLOS Computational Biology Methods article. 

Introduction 

The effectiveness of disease control measures often depend on 
when outbreaks are first discovered. Early detection can signifi- 
cantly reduce the costs associated with disease eradication, human 
illnesses, and devastation of livestock or crops. For example, the 
2001 epidemic of foot and mouth in Great Britain was reported 
only 2 weeks after the epidemic began [1] yet had an estimated 
financial impact of $1 1.9-$18.4 billion dollars [2]. A hypothetical 
foot and mouth epidemic in California not detected for 2 weeks 
could have a financial impact of over $15 billion dollars, and an 
epidemic not detected for 3 weeks could have an impact of up to 
$69 billion dollars [3], Although many studies have examined 
alternative control strategies and the impact of detection time on 
control [1,2,4], the complementary question of how to achieve 
early detection has been relatively neglected by theory. Greater 
attention to the design of disease surveillance methods may 
facilitate earlier detection and reduce the economic impacts of 
disease epidemics. 

Passive surveillance methods are the voluntary reporting of 
cases by primary care providers and citizens to public health 



officials [5]. Recent work on passive surveillance methods for 
human infectious diseases has progressed rapidly and includes 
developing methods to optimize the placement [6,7] and 
performance [8] of surveillance sites. Integrating these physical 
surveillance systems with internet search data has led to 
improvements in the performance of traditional physical reporting 
systems [7,9]. Active surveillance methods of zoonotic diseases are 
the periodic sampling by health authorities [5]. For vector-borne 
diseases active surveillance may include the use of sentinel animals 
and the longitudinal sampling of vector populations [10]. Active 
surveillance may often perform better for targeted objectives than 
passive methods [5], and recent work has begun to link active 
zoonotic surveillance data to epidemiological models. For exam- 
ple, Gerardo-Giorda et al. [11] combined surveillance data and 
epidemiological models to identify counties that were most 
important for surveillance efforts of rabies in New York State. It 
is likely that analytical approaches will prove useful in making 
active zoonotic surveillance methods more cost effective, an 
important consideration for surveillance organizations with limited 
resources [12]. 

Past analytical work on active disease detection examined how 
sampling for infected individuals in a susceptible population 
affects the time at which an epidemic is detected [13,14] and the 
subsequent incidence of a disease at the time of discovery [15]. 
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Author Summary 

Outbreaks of zoonoses can have large costs to society 
through public health and agricultural impacts. Because 
many zoonoses co-occur in multiple animal populations 
simultaneously, detection of zoonotic outbreaks can be 
especially difficult. We evaluated how to design sampling 
strategies for the early detection of disease outbreaks of 
vector-borne diseases. We built a framework to integrate 
epidemiological dynamical models with a sampling pro- 
cess that accounts for budgetary constraints, such as those 
faced by many management agencies. We illustrate our 
approach using West Nile virus, a globally-spreading 
zoonotic arbovirus that has significantly affected North 
American bird populations. Our results suggest that simple 
formulas can often make robust predictions about the 
proper sampling procedure, though we also illustrate how 
computational methods can be used to extend our 
framework to more realistic modeling scenarios when 
these simple predictions break down. 

These studies have examined the dynamics of diseases that are 
directly transmitted and thus lack a disease vector. As a result, we 
still have little knowledge to guide early detection theory for 
zoonotic diseases (e.g., Lyme disease, malaria, Rift Valley fever 
virus, West Nile virus, dengue fever), where sampling could occur 
in vector populations or host populations. Here, we studied the 
optimal sampling design for early disease detection using 
formulations of a disease with one host population and one 
vector population. We combined models of host-vector dynamics 
with a periodic sampling procedure in which sample size is 
constrained by economic limitations. We used a susceptible- 
infected (SI) model to examine how to allocate sampling effort 
between the vector and host populations, and we used a 
susceptible-infected-recovered (SIR) model to look at allocating 
sampling effort between the vector population, infected hosts, and 
recovered hosts. 

The CDC guidelines for evaluating public health surveillance of 
human based diseases [16] are standards that have been used in 
many assessments of zoonotic surveillance systems, although 
differences may exist between human and zoonotic surveillance 
goals [17]. A recent survey on the assessment of surveillance 
systems found that a number of different metrics have been used to 
determine zoonotic surveillance performance; two of the most 
frequently mentioned criteria are the sensitivity of surveillance (the 
ability to detect outbreaks or infection rates) and the time to 
outbreak detection from initial exposure [17]. Here we assume the 
goal of surveillance is to detect the outbreak as early as possible to 
minimize financial damages or spillover human infections, a 
common goal for zoonoses [12]. Our results provide some basic 
rules of thumb for practitioners designing active surveillance 
protocols for vector-borne diseases. 

Models 

We modeled the early dynamics of a disease with a single vector 
species and a single, non-human host species. Here we define 
early-time approximations to systems where vectors follow SI 
dynamics and hosts follow either SI- or SIR- dynamics. We then 
define a sampling model that can be applied to these dynamical 
systems. The sampling of human hosts often has additional 
considerations not accounted for in this sampling framework. We 
therefore address specific issues about human populations in the 
discussion. 



Vector-host SI model 

We made assumptions common to other SI models of vector- 
borne diseases: vectors and hosts can be in either a susceptible or 
an infected state at time t, the disease epidemic (the dynamics of 
interest) occur on a relatively short time-scale and thus infected 
individuals cannot recover nor do individuals give birth or die over 
the course of the epidemic, and infection spreads only through 
interspecific interactions [18,19]. Subscripts are used to denote 
population-level parameters: e.g., Ih(?) an d Iv{t) denote the 
number of infected hosts (H) and vectors ( V) at time t, respectively 
(Table 1). These assumptions give the following system of 
equations for the dynamics of infection of the SI model: 

iv(t) = Pr,B I ^(Nv-Iv(.t)) (la) 



M)=fe/^(^-M0), (lb) 

iv y 

where P H v and Py jj correspond to the disease transmission rates 
from vectors to hosts and hosts to vectors, respectively. Nh and 
Ny correspond to the total host and vector population sizes, 
respectively. Throughout this work we assume that population 
sizes are constant over the course of the epidemic and that 
individuals are in the population only if they can potentially 
contract the disease. This implies that individuals that are 
epidemiologically isolated are not a part of the population. Note 
that the dynamics of susceptible host (Sh(0) aid vector (Sy{f)) 
populations are completely determined by system (1) because 
S H (t) = N„-I H (t) and Sr(t) = N v -Iy(t). 

Because we are interested in detecting a disease as early as 
possible, we focus on the dynamics of the system immediately after 
disease introduction. Therefore we linearized system (1) about the 
disease-free state (7p(0) = 0, /#(()) = 0) and obtained: 

iv{t) = P VM ^I H {t) (2a) 



i H (t) = f$ H ,v^ L Iv(t). (2b) 

We focus our subsequent analyses on the specific scenario of an 
epidemic started by an infected host with the initial conditions 
(7//(0)>0,/(/(0) = 0) though analyses of alternative initial condi- 
tions (7//(0) = 0,/j/(0)>0) are presented in Text S3 and S4. 
Because of the symmetric nature of system (2), this analysis yields 
similar results. With the assumptions listed above, the solution to 
system (2) is: 

I v (r) = I H (0) ^ v ;l Nv smh(Xt) (3a) 



I H (t) = I H (0)cosh(Xt), (3b) 
where k= tJP v ,hPh,v- 
SIR host- SI vector model 

For the SIR model we assume that recovered hosts obtain 
immunity over the timescale of the epidemic. As in the SI model 
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Table 1. 


Descriptions of parameters and sources of parameter estimates. 








Parameter 


Description 


Value 


Source 


Ny 


Abundance of vectors. 






N„ 


Abundance of hosts. 






IvoiH 


Number of infected individuals in the vector or host populations. 






n 

l<ll 


Number of recovered individuals in the host population. 






f>H,V 


Daily transmission rate from vectors to hosts. 


0.0792 


[26] 


K,H 


Daily transmission rate from hosts to vectors. 


0.0144 


[26] 


\jy 


Number of days until infected host recovers. 


varied 




^VovHovR 


The economic efficiency of the vector, infected host, or recovered host populations. 






Cmax 


Resources allocated to obtaining and running samples in a fixed period of time. 


88 


in text 


bvoiHavR 


Cost of running a sample from the vector population, infected host population, 
or recovered host population. 


b v = b H = b R = \ 


in text 


SVorHorR 


Sample size of the vector population, infected host population, or recovered host 
population. 






s Optimal sample design for a epidemiological system that is sampled with economic 
constraints. 


PVotHoiR 


Pool sizes of the vector population, infected host population, or recovered host 
population. 


pv = 50, p H =p R = 6 


in text 




Critical time in a linearized system. At this time the optimal sampling design 
changes from stratum X to stratum Y. 






doi:1 0.1 371/journal.pcbi.1 003668.t001 



the disease cannot spread through direct contacts within host and 
vector populations, transmission is frequency-dependent, and 
individuals are not born and do not die over the course of the 
epidemic. The full model for a single host population and single 
vector population is given by 



For an epidemic begun by an infected host 
(7 F (0) = 0, I H (0)>0, R H (0) = 0), the solution of (5) is: 



jiN H 



sinh(/i?/2) 



(6a) 



i r (ty- 



I H (t)(N v -Iv(t)) 



(4a) 



I H (t) = I H (Q)e-"'l 2 (cosh(iit/2)- ^sinh(^/2)j (6b) 



I H {t)- 



Ni 



- I v (t)(N H -I H (t)- R H {t)) - yI H {t) (4b) 



R H (t) = I H (0)e-<" 2 ^ sinhQxt/2), 



(6c) 



R H {t)=yl H (t) 



where Rh(0 designates recovered individuals and y is the recovery 
rate of infected individuals. Note that the dynamics of susceptible 
host and vector (Sy(t)) classes are completely determined 

by system (4) because Sjj(t) = Nh —In(t) — Rn{t) and 
Sy(t) = Ny — Iy(t). The corresponding linearized model evaluat- 
ed at (7 K (0) = 0, I H (0) = 0, 7^,(0) = 0) is 



I v (t): 



N v 



I Hit) 



Nh 

--PH,v^iv(t)-yi H (t) 



R H {t)=yiH{t). 



(4c) where j.1 = ^Jy 2 +4/3^ v fi v H . Solutions for the alternative initial 

conditions (lv(0) > 0, 7//(0) = 0, R H (Q) = 0) for system (5) are given 
in Text S4. This analysis is slighdy more complicated than that 
presented in the main text, but the core ideas remain the same. 

Sampling model 

Consider sampling at time t from a population with potentially 
infected hosts and vectors. Let D(t) denote the set of events such 
that the disease is detected from a sample of size s. If the total 
population abundance is much greater than the sample size then 
(5a) P(D{t)), the probability of detecting the disease, can be modeled 
as a binomial random variable. When the sample size is 
comparable to the population abundance, the hypergeometric 
distribution is a suitable sampling model. We do not consider the 
(5b) hypergeometric model here as the binomial distribution provides a 
reasonable approximation for realistic sample sizes. The propor- 
tion of infecteds at time t is given by I(t)/N . In a sample of size s, 
P(D(t)) is the complement of not detecting any infected 
(5c) individuals, 
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P(D(t)) = \- 



N 



(7) 



When the disease is rare (I(t)/N«l) equation (7) is well 
approximated as 

P(D(t))K\-e- sI{,)IN . 

If there are two sampling strata (e.g., a host and vector, although 
the approach works as well for two host species or two strata of 
hosts in a single species), we need the probability of detecting the 
disease in either of those strata, P(D\(t) or D 2 {t)). F° r two strata 
this quantity is given by 

P{D x (i) U D 2 (t)) = P(D 1 (0) + P(D 2 (t)) - P(D l (t) n D 2 (tj) (8) 



_ (i - e -"i'\WIN l ^ l _ e -s 2 i 2 (t)iN 2 ^ 

~ 1 _ <? -- 5 l / lW/JV 1 -s 2 I 2 (i)IN 2 _ ^ 
A more general form when there are n sampled strata is given by 



■\-e 



(10) 



as shown in Text SI. 

We also consider the probability of detecting the disease for the 
first time in the Z:" 1 sample when sampling occurs regularly at 
discrete time intervals. A model of detecting the disease in the 
current sampling period, but not before, is a geometric distribution 
with time-dependent detection probabilities: 

fc-l 

p(k)= (i-rK=i«/W«) n( e -H=i'' 7 WW). ( ii) 



Here the number of infecteds in stratum i in the j th sampling 
period is given by I,(tj). Sampling strata are defined by both the 
animal population being sampled and the type of test that is run. 
For example, immunological tests on bird populations for West 
Nile virus can test whether individuals are currently infected or 
have been previously infected by the type of antibody present in 
the sample. Antibody-specific tests therefore distinguish between 
infected birds and recovered birds. The first term on the right- 
hand side of expression (11) represents the probability of detecting 
the disease in time period k. The remaining k — 1 terms (given in 
capital Pi notation, II) represent the probability of not detecting 
the disease in sampling period j, where j runs from 1 to k — 1 . The 
product of these k—l terms gives the probability of not detecting 
the disease in any of the 7 = 1 to k—l sampling periods. We 
minimized the time until detection of the epidemic using the 
geometric probability distribution defined in equation (11) and by 
using the expected time to detection, given by 



E[K]=^TkP(K = k). 



(12) 



This expected value is an infinite series that converges to an 
unknown quantity, therefore we numerically approximated 

We have so far considered the possibility of sampling and testing 
infected and recovered individuals in populations. However, a 
common practice in zoonotic surveillance is to combine samples 
from multiple individuals in the stratum of interest in order to save 
money (e.g., [20]). Though this pooled sampling does not identify 
which individual tested positive for the virus, the goal of 
surveillance is often to identify the presence of the virus instead 
of a specific infected individual. Pooling sizes must be constrained 
to prevent the possibility of a positive individual sample being 
diluted below detectable levels, often determined using experi- 
mental dilutions in the laboratory (e.g., [21]). To incorporate 
pooled sampling into our sampling model, we rescaled the 
probability of a positive detection in a single sample by the 
number of individuals in a pool, p: the probability of detection in a 
single pooled sample of strata ; at time t can then be approximated 
using a linearized binomial expansion: 



1-1 



-Pi 



m 



(13) 



Approximation (13) works well when — — < < 1. Our simulations 

indicate that a 10% approximation error occurs when 

p-^-^x0.25, suggesting the approximation is robust for the 

purpose of early detection. Pooled sampling modifies (10) and (11) 
to 



-£? =1 />m(')/*« 



(14a) 



ft. — 1 

P(k) = (l -e~ ^=1 Pi'iWk)/"^ II (e~ £"= l PinWjWi} . ( 14b) 

Our goal is to determine the resource allocation that will allow us 
to detect a disease as early as possible. We therefore introduce 
economic constraints on this sampling process in the next section. 

Economic constraints and optimization 

Agencies are often faced with monitoring endemic and 
emerging diseases with finite resources. This necessitates allocating 
those resources in the most efficient way possible. We applied a 
cost function to describe these constraints: we let C max be the 
budget for a set of samples taken periodically and C(s\ , . . . s n ) be 
the cost of sampling s, individuals from stratum (, 1 <i<n. If we 
assume that we spend our entire budget then C(s\ ,....?„) = C max . 
For example, a linear cost function for a vector stratum and a host 
stratum can be written as 



--a v + b v s v + a H + b H s H , 



(15) 



where ay and ajj axe the overhead costs (operating costs) 
associated with sampling vectors and hosts respectively, while by 
and b[{ are the corresponding costs per sample. 
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We used the Karush-Kuhn-Tucker (KKT) conditions [22] to 
find the sampling strategy S* = (,?*,... s*) that maximizes the 
probability of detection (given by equation (14a)) or minimizes the 
time to disease detection (given by equation (14b)). The KKT 
approach allows the minimization of a function subject to 
inequality constraints, e.g., constraining the sample sizes to be 
nonnegative. Further details on this method, as well as some 
general results for cases with linear objective functions, are 
provided in Text S2 and S3. 

Results 

Here we apply the sampling framework defined in the Methods 
to hypothetical epidemics of West Nile virus. We provide some 
analytical results and examine via computer simulation how well 
our early-time approximations match the corresponding full 
models. 

SI dynamics 

A general solution to the host-vector SI model. We now 

determine the sampling scheme that maximizes the probability of 
disease detection under the assumptions of the full SI model for 
one host and one vector species, given by system (1). To find 
potential optimal sampling schemes in the SI host-vector model, 
s * = ( s v< s h)> we maximize the probability that the disease will be 
detected in a single sampling trial at a fixed time t, assuming that 
the states Iv(t) and In(t) are known. When applying the KKT 
conditions to equations (10) or (11), we see that the optimal 
sampling scheme depends on a quantity that we call the economic 
efficiency of a stratum. The economic efficiency of stratum j is 



1j 



NjC SJ (s*y 



(16) 



where C s .(s*) is the partial derivative of the cost function with 

respect to the sample size of the y'th stratum evaluated at the 
optimal strategy s* . The economic efficiency, rj, gives the marginal 
return on a dollar investment: i.e., it gives the expected number of 
new positive detections for a small change in added investment. 
This quantity, derived in Text S3, is useful for testing the 
conditions needed to determine the optimal sampling design. 

With reference to Text S3, and making no assumptions about 
the form of the cost function C, we see that there are five possible 
types of optimal sampling scheme. First, it is possible that there is 
no nontrivial optimal sampling sampling scheme. This would be 
the case if the total overhead cost was as large or larger than the 
budget (i.e. ay + > C max ). Next, if the disease is not present, all 
economically feasible sampling schemes give the same (zero) 
probability of detection. Lasdy, if neither of these cases holds, then 
there exists some nontrivial optimal sampling scheme s* . The 
KKT conditions give candidates for s* . All candidates must then 
be evaluated to determine the true optimal sampling scheme s* . 
Candidates for the nontrivial optimal sampling scheme at a fixed 
time t are determined by comparing rj v (t) and r\ H (t): 

First, if there exists some s* = (s* v ,s* H ) such that s* v > 0, s* H > 0 
and if 



y r (s*)>riH(s*) 



(18) 



then s* is a candidate for the optimal sampling scheme. This 
sampling scheme would entail sampling from only the vector 
population. Third, if there exists some S* such that s* v = 0, s H >0 
and if 



1v(s*)<1 H (s*) 



(19) 



then s* is a candidate for the optimal sampling scheme. This 
sampling scheme would entail sampling from only the host 
population. Note that for a nonlinear cost function, it is possible 
that more than one of ( 1 7) — ( 1 9) may hold at the same time for 
different sampling strategies. Once all candidates for s* have been 
found, the probability of detection (given in (10)) must be evaluated 
for each candidate and maximized. 

In the case of a linear cost function, that is, when 
C{sy,SH) = av + bySy + aH + bHSH, the partial derivatives 
Cs r (sy,SH) and C Sy (sy,SH) are constants. Thus, the expressions 
for r\ v and r\ H are independent of s*: 



1h(s*)-- 



Iy{t) 



_ Iv(t) 

NrCybT) Nyby 



Ih(0 



Mt) 

N H Q H (s*) N H b H - 



Now, since at any time t only one of the relations ( 1 7) — ( 1 9) can 
hold, there is only one candidate for the nontrivial sampling 



scheme s* and we have that the relative magnitudes of 



MO 

Nyby 



and 



I H (t) . Iv(t)/Ny by 

; — , or equivalently — — — and - — , determine the nontrivial 

N H b H I H (t)/N H b H 

optimal sampling scheme S*. Thus, we deduce the shape of the 
Iy(t)/Ny 

curve - , N ,, r with respect to time t. A complete treatment of 
I H (t)/N„ 

this problem is given in Text S3. Here, we give an outline of the 
solution, though the rest of this section can be skipped by readers 
uninterested in this level of mathematical detail. 

To simplify our notation in the following analysis, we define 



K( 0 =^ and H ([ )= lH(t) 



Ny 



N„ 



(20) 



to be the proportion of the vector (host) population that is infected. 

V(t) 

We are then concerned with the curve . Noting that whenever 

H(t) 

V(t) is positive H(i) is strictly increasing with respect to time t, we 
reparameterize V{t) as a function of H to obtain V(H). We now 
see that our original problem is equivalent to characterizing the 

curve — ^— — with respect to H. Thus we can rewrite the conditions 
H 

outiined in equations ( 1 7) — ( 1 9) as: 



nv(s*)=n H ( s *) 



(17) 



then s* is a candidate for the optimal sampling scheme. This 
sampling scheme would involve sampling from both the vector 
and the host populations. Second, if there exists some S* such that 
s* v >0, s* H = 0 and if 



V(H) _by 

~ir~b„ 



>sample both vectors and hosts (21) 



V(H) b 



H 



> — => sample only vectors 
Oh 



(22) 



PLOS Computational Biology | www.ploscompbiol.org 



5 



June 2014 | Volume 10 | Issue 6 | e1 003668 



Optimal Sampling Strategies for Detecting Zoonotic Disease Epidemics 



V{H) by 

— ==— < — =>sample only hosts. 
H b H 



(23) 



The curve — - — - can be completely characterized by the ratio 
H 

Ph,v/Pv,h an d me initial conditions (Text S3). The three relative 
states of Ph,vIPv,h (i- e -> >1> = 1> or < 1) an d me tw° initial 
conditions (i.e., the disease starts in the vector vs. the host) define 

V(H) 

six qualitatively different curves for — — — (Figure 1). It is 

ti 

important to note that for fixed sampling costs by and bjj, the 
optimal sampling scheme depends only on the proportion of a 
population that is infected (V and H). Thus the number of 
infecteds {Iy and Ih) and the time since introduction affect the 
optimal sampling scheme only via their influence on the 
proportion of a population that is infected. 

We can use Figure 1 to elucidate the optimal sampling scheme 

at a given time. If the curve — y— — lies above — — (i.e. satisfies (22)), 

H bn 
then the optimal scheme is to sample only vectors. Conversely, if 

V(H) by 
the curve — — lies below j— (i.e. satisfies (23)), then the optimal 
H b H 

scheme is to sample only hosts. In all six cases, the best sampling 

scheme at very early times (corresponding to a small proportion of 

the host population being infected, i.e. small H) is to sample the 

population in which the epidemic originated; however, as the 

epidemic progresses, it is possible to have a switch to sampling the 

population that was not initially infected. 



Lastly, we consider the effect of sampling error on our 
determination of optimal sampling scheme. Suppose there is a 
small error ey (ch) in the detected proportion of infected vectors 
(hosts). Then by equations (2 1) — (23), the optimal sampling scheme 

is determined by comparing the relative magnitudes of — - — - — - 

H + ch 

and £p. Early in the progression of the epidemic, both V{H) and 
H may be very small, even on the same order of magnitude as the 
errors ey and £ff. In this case, sampling error can easily alter the 
determined optimal sampling scheme. As the disease progresses, 
V(H) and H become larger and the perturbation by £y (ejj) 
becomes less significant. In the following example, we revert to our 
original notation, given in (20). 

A general SI model of West Nile virus. We illustrate an 
application of sampling optimization by applying our approach to 
West Nile virus, a mosquito-borne pathogen introduced to the 
United States in 1999 [23]. Since introduction the disease has 
resulted in numerous deaths in humans [24] and large-scale 
declines in bird abundances [25] . Testing for West Nile virus has 
often relied on counts of dead birds. However, as pointed out by 
Hochachka et al. [10], this may only be useful for indicating the 
later stages of severe epidemics and thus fails to lead to effective 
containment of the epidemic. Here, we examine an alternative 
strategy wherein both mosquito traps and sentinel chickens, 
groups of chickens placed in cages and distributed throughout an 
area, are used to detect a disease epidemic. In the following 
analyses we investigate sampling between vector and bird host 
populations but neglect spillover hosts such as humans and horses. 

Epidemiological parameter values are taken from the literature 
[26] or determined here; a summary is provided in Table 1. We 
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Figure 1. The relative magnitudes of V(H) and H under six cases that differ in initial condition and relative transmission rates. The 

position of (above or below) relative to the red dashed line (of height one) remains the same for any choice of (i H v and fi VH within each case 
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and 6, and the blue curve would approach positive infinity in Cases 1, 3, and 5. 
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assumed that the overhead costs were zero (ay = ciH = 0) and that 
the cost of running a test is the same for a sample of mosquitoes or 
birds, which we set to the arbitrary value of 1 (by = bji = l). 
Mosquito samples are usually pooled into batches of around 50 
individuals [20], while blood samples are pooled from birds in a 
single sentinel chicken cage, usually with 6 birds. Therefore we let 
Pv = 50 and pn = 6. In the following analyses we assumed that the 
disease is introduced by the host, rather then the vector. This is 
likely a more common method of introduction for West Nile virus 
as birds typically move over much greater distances than 
mosquitoes (e.g., [27]). 

We let sy be the number of pooled vector samples tested and sh 
be the number of pooled host samples tested so that pySy and 
PhSh are the total number of individuals tested in the vector and 
host populations. Plugging the values defined above into (15) gives 
C max =sy + sh, while the economic efficiencies for the vector and 
host populations are given by 



I H {i) I B (t) 



N H 



and 



nv=Pv 



hit) c Jv(t) 



50- 



Ny 



respectively. The slight modification of the economic efficiencies 
given in (16) includes the effect that pooling has on increasing the 
economic efficiency of sampling. We assumed a weekly budget 
commensurate with a county-level mosquito control agency. For 
example, the agency in Hillsborough County, Florida maintains 
13 sentinel chicken flocks and 75 mosquito light traps that are 
typically checked weekly during peak season. Thus, we assumed 
that the weekly budget is C max = 75 + 13 = 88. 

We now use the analysis of the previous section to obtain some 
qualitative results regarding the aforementioned disease detection 
scenario. Recall that by = bjj, but that different numbers of 
mosquitoes (py = 50) and chickens (pn = 6) are pooled. As a result, 
the conditions given in (2 1)— (23) become: 



Iy(t)/Ny = byp H = 6 

IhW/Nh b HPv 50 



>sample both vectors and hosts 



Iy(t)/Ny b vP H 6 

— —4 > = — => sample only vectors 

I H (t)/N H b HPr 50 



Iy(t)/Ny b vPH 6 

T , , ,, r < -. = =>sample only hosts. 

I H (t)/N H b HP y 50 

Note that from the parameter values given in Table 1 we have 



0.0792 



P VtH 0.0144 



= 5.5, 



which is larger than one. Thus, (by Table 3 in Text S3) if a vector 
IyNji 

is initially infected, the curve is qualitatively similar to that 

Nylff 

shown in Case 5 of Figure 1. If a host is initially infected, the curve 
IyN H 

is qualitatively similar to that shown in Case 6 of Figure 1. 

Nyl H 



Suppose that a host is initially infected. (Figure 1, Case 6) Then 
IyNH 

is close to zero at early times, so the above conditions imply 

Nyln 

that the optimal sampling strategy is to sample only hosts. As the 

6 

infected vector population grows and becomes larger than — , 

there is a switch from sampling only the host population to 

sampling only the vector population. Since this switch depends on 

the ratio of the proportions of the populations that are infected, the 

switch will occur at different times for different total population 

sizes, assuming a constant number of initially infected individuals 

and constant transmission rates. Note that if it is more expensive to 

sample the host population than the vector population (by<bn) 

then the time to switch from sampling only hosts to sampling only 

vectors is earlier. Conversely, if it is more expensive to sample 

vectors (by>bfj) then the switch time becomes later. 

It is easy to show that for the parameter values given in Table 1 

IyN H 6 
if a vector is initially infected, then — — — is always greater than — 

Nylf{ 50 

and the optimal sampling strategy is to expend the entire budget 

sampling vectors. If it becomes more expensive to sample hosts 

(l>H>by), then the optimal sampling scheme does not change. 

Conversely, if it becomes sufficiently less expensive to sample hosts 

(pH<bv) then there is a switch from sampling only vectors to 

sampling only hosts. As bn becomes smaller, the switch time 

becomes earlier. In the following section we show how to solve for 

the switch times using an approximation to this model. 

Optimal sampling of the linearized SI model. Here we 
consider specific solutions of the linearized SI system, given by 
system (2), under the linear cost function described in the previous 
section: C mllx = Sy + SH- This simplified system can provide 
practitioners with some insight on sampling design when not 
much detailed information is known about the functional form of 
disease dynamics. When there is doubt about the validity of these 
approximations, the full solutions developed in the previous 
section can be used to determine the range over which linear 
approximations will be useful. Here and through the rest of the 
manuscript we only consider the introduction of disease by the 
more migratory host population, though calculations are provided 
for alternative initial conditions in Text S4. 

Applying the KKT conditions (Text S2) to the linear solutions 

(3) shows it is optimal to place all of the sampling effort into either 

the host or vector populations when cost functions are linear. As 

above, we determined the stratum to sample by calculating and 

comparing the economic efficiencies of each stratum. The 

economic efficiencies with pooled sampling for this system are 

IhH) hit) 
given by !1h = Pht — ;rr~ and f]y=PVT — zf~- With this model it is 
bnNn byNy 

possible for the sampled population to switch at some critical time 

as discussed in the previous section, denoted th->v, when the most 

economically efficient stratum changes from host to vector. If such 

a switch occurs, the critical time occurs when 



ri H (*H^v) = riy(TH^v) 



PhMO) 



cosh(;j) 
b H N H 



=p V I H (0) 



NyPy H sinh(lt) 

byNy).N H 



*atanh( V^ 

A \Pv,HPvb H 



(24) 



PLOS Computational Biology | www.ploscompbiol.org 



7 



June 2014 | Volume 10 | Issue 6 | e1 003668 



Optimal Sampling Strategies for Detecting Zoonotic Disease Epidemics 



Substituting X = Ph,vPv,h m the above equation, and since 
atanh is defined only when its argument is less than one in 
absolute value, we see that Th^v exists only if 



PH,vPHby 
Pv,HPvb H 



<1. 



Additionally, since ^IviO 



d ^ 



(25) 



whenever 



PH,vPHby 
$v,HPvb H 



stratum at times later than Th-*v- The critical time for the 
alternative case, (7k(0)>0, /#(()) = 0), is given in Text S4. 

The linearized SI sampling model applied to West Nile 
virus. We now revisit the example of West Nile virus using the 
linearized approximations derived in the previous section. First, 
the presence of a critical-time in the binomial sampling model was 



demonstrated because 



$H,VPHb\ 



= 0.28 (parameter values 

Pv,hPvo h 

given in Table 1), which satisfied condition (25). Using (24), the 
critical time to switch from sampling hosts to vectors was 
Tj7_»p = 8.6 days. This is a short timescale when the sampling 
period is weekly; therefore, if the population is large then it is 
unlikely that the virus will be detected in the first week after initial 
infection and a reasonable time-independent strategy is to place 
most or all of the sampling effort into the vector population. 

We also explored optimal sampling in the full SI model through 
simulation. This allowed us to determine how well our decisions 
based on binomial sampling with linearized dynamics approxi- 
mated the full SI model with geometric sampling. We used the 
same parameter values as above but we varied both population 
abundances independently over several orders of magnitude. Nh 
was varied from 10 3 to 10 10 individuals and Ny from 10 4 to 10 10 
individuals. For each combination of population abundances 
considered, we simulated the dynamics of an SI epidemic using 
numerical solutions to the nonlinear system (1). We then calculated 
the expected time to disease discovery using (12) and tested each 
potential sampling strategy (the host sample size, Sfj, ranged from 
integers 0 to 88 and the vector sample size was 5k = 88 — sh)- We 
found that the optimal sampling strategy, S*, for all population 
abundances considered was dependent on Nu and independent of 
Ny. When Nh> 3299 the optimal solution was (5^ = 88,5^ = 0), 
when Nh < 1 399 the optimal solution was (5^ = 0, s* H = 88), while 
within this relatively narrow range of abundances we found that a 
linear relationship described a mixed ideal strategy, where 
S * H = -0.046N H + 152.80 and s* v = B8-s* H . Slope and intercept 
coefficients were calculated from simulation output. This suggests 
that when the sample size, Sh, is less than 16% of the total 
population size, Nh, then the linearized system provides a 
reasonable approximation. 

Finally, we looked at the error due to suboptimal sampling by 
calculating the difference in time to detection of the epidemic by 
comparing the applied sampling design consistent with the 
parameters we inferred from the Hillsborough County mosquito 
control agency with 5k = 75, Sh = 13 to the optimal sampling 
design s*(Nh)- Proportional error levels were high when the 
vector population abundances were low, with an error of 
approximately 30% when Nhx\0 3 , but this value quickly 
decreased (Figure 2). The absolute error in expected detection 
timing between the optimal and suboptimal sampling designs was 
about a week for all population abundances considered. 



< 1 , vectors are the most economically efficient = 



■fi a- 



o 

CD 
CD 

■o 
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Figure 2. Percent difference in the expected detection time 
between the optimal, s", and suboptimal, (s v = 75, s H = 13), 
sampling schemes as a function of the host population 
abundance, N H - The vector sample size is denoted as s, , and 
the host sample size is denoted by s H . 
doi:10.1371/journal.pcbi.1003668.g002 

SIR dynamics 

A solution to the linearized SIR host- SI vector 
dynamics. In the case where the host population follows the 
SIR dynamics defined in (5) we must consider sampling between 
infected vectors (Iy(t)), infected hosts (Init)), and recovered hosts 
{Rn(t))- As a host progresses through the infected class and into 
recovery the relative quantities of virus and antibodies within the 
host change. Typically, different testing procedures are used to 
detect infected individuals and recovered individuals [28,29]. 
Therefore, a framework to determine whether to sample for 
infected or recovered individuals can save resources and poten- 
tially lead to faster detection times. Here, using the expression for 
P{D(t)) (10), the linearized solutions (6), and assuming a linear cost 
function (15), we solve for critical times at which the optimal 
sampling strategy changes. 

We consider four critical times: xh^v, xh^r, xy^n, and xr^v- 
The first, Xh-*v, ls the time at which the optimal sampling strategy 
switches from sampling only infected hosts to sampling only the 
vector population. The time at which the optimal sampling 
strategy switches from sampling infected hosts to sampling 
recovered hosts is xh^r. The times xy^R and tr_»k give the 
critical switches from sampling only the vector population to 
recovered hosts or vice-versa. The other potential critical times 
xy^H and xr^h af e not considered here because they do not exist 
at early times for for these initial conditions 
(7p-(0) = 0,/jy(0)>0,.Rjjf(0) = 0), but expressions are given for 
alternative initial conditions in Text S4. 

The first critical time, Xh^v, is given by the equivalence of the 
economic efficiencies for the infected vectors and hosts, 
f]H( x H^v) = t\y(xH^v)- Similar to the process in the SI model 
we plug in Iy(xn^y) and Ih(xh^v) from (6) into the economic 
efficiencies and solve for Zh-*v- This gives 



Tjr-, v = - atanh 



jlb V p H 



^H,y b HPv + yb v p H 



(26) 



PLOS Computational Biology | www.ploscompbiol.org 



8 



June 2014 | Volume 10 | Issue 6 | e1 003668 



Optimal Sampling Strategies for Detecting Zoonotic Disease Epidemics 



Note that since the argument of atanh must be less than one in 

absolute value, Th-*v exists only if — ^ vPh_ < ^ 

2pH,v h nPv+ybvPH 
d ' 

>-j-ri H (t) whenever 



Additionally, since ~^ r \v^S) 



pb v p H 



< 1 , vectors are the most economically 



2P H ,V b HPV+ybvPH 

efficient class at times later than Xjj^y. This formula is analogous 
to the SIS case given in equation (24), with some slight 
modifications due to the additional recovery state. As the recovery 
rate y goes to 0, the above expression for %h-*v approaches the 
SIS formula given in (24). 

The second case, th^r, occurs when the economic efficiencies 
of infected and recovered hosts are equal, ri H = rj R . We first need 
to define b R and p R , the cost per sample and pooling sizes of the 
recovered vector stratum. Plugging in solutions from (6) gives, 



2 ♦ uf HbRPH 

ih^r = - atanh — — 

H \2yb H pR + yb R pH 



(27) 



where b R is the cost per sample for recovered hosts and ih^r 
only if ^ ~^~T < 1 ■ Additionally, since 



exists 



dt 



n R {t) 



2yb H p R +yb R p H 
d ,x 

> — *lH(t) 



l ~ T H->R 



dt 



whenever 



jib R p H 



l ~ T H->R 



2yb H p R + yb R p H 
< 1 , recovered hosts are the most economically efficient class at 
times later than %h-*v- 

The third and fourth cases are the switch from infected vectors 
to recovered hosts and vice versa. As before we set equal the 
economic efficiencies, 1 r (t) = r] R (t), with appropriate substitutions 
for Iy(t) and Rh(0- With the linearized solutions (6), the ratio of 
the economic efficiencies is a constant, 



rjy(t) = pyb R $ YH 
1r(() PRbyy 



(28) 



This means that iy^ R and iR^y do not exist. Instead, the 
economic efficiencies determine which stratum is sampled by 
evaluating whether (28) is greater than or less than 1 . If the ratio is 
greater than 1 it will be optimal to sample from the vector stratum; 
when less than 1 it is optimal to sample from the recovered host 
stratum. Critical times for an alternative initial conditions 
{I v (0)>0J H (0) = 0,R H (0) = 0) of the linearized SIR model with 
binomial sampling are provided in Text S4. 

An SIR model of West Nile virus. We first applied the 
critical time expressions derived in the previous section using the 
linear approximation (6) and binomial sampling model (14a) to 
determine the optimal sampling design, s*. We then used 
simulations to determine if these values were consistent with the 
nonlinear SIR (4) and geometric sampling model (14b). We 
explore the optimal sampling design as a function of the unknown 
recovery time of the bird populations, 1 jy. 

Applying the linearized dynamics and binomial sampling 
model, we found that the critical switch between infected hosts 
and vectors, %h-*v (equation (26)), was less than two days for all 
recovery times (Figure 3). Therefore, with a weekly sampling 
protocol it will be more economically efficient to sample for 
infected vectors rather than for infected hosts. We next tested 
whether it is better to sample infected vectors or recovered hosts. 
Looking at equation (28), we found it is optimal to sample from the 
recovered host class when l/y<8.33. Otherwise, it is optimal to 
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Figure 3. The critical time as a function of the unknown 
recovery time, \/y. Because the switch time between infected hosts 
and vectors {x H ~,v) is always be less than two days, with weekly 
sampling periods it will be better to sample for infected vectors than for 
infected hosts. 

doi:1 0.1 37Vjoumal.pcbi.1003668.g003 

sample from the infected vectors class. Taken together, our 
analysis suggests that if the recovery time is less than about 8 days, 
it will be best to sample from the recovered host stratum; otherwise 
it will be best to sample from the infected vector stratum, and in all 
scenarios the infected host class can be ignored. 

Applying equation (26) can potentially be misleading when the 
initial infection is presumed to be in the host population and it 
really originates in the vector. In this case the true switch time is 
given by ty^H (Text S4, equation (S27)) for which a switch time 
does not exist for any y. The switch from vectors to recovered hosts 
for these alternative initial conditions is given by equation (S27), 
and this exists when i = 8.33. However, the switch times predicted 
are all very large with Zy^ R > 100 days so the optimal strategy 
would be to always sample for infected vectors. This suggests there 
would be error in this scenario from poor assumptions about the 
initial condition when i < 8.33, though for higher recovery times 
we correctly decide to sample infected vectors. We examine the 
potential costs of this incorrect decision in terms of the expected 
time to detection at the end of the following analysis. 

We examined the robustness of our predictions using simula- 
tions of the full SIR model with geometric sampling to find the 
optimal sampling design s* for recovery times ranging from 1 to 50 
days (7=1 to 1 / 50). Our model predictions showed that S* 
depends on the recovery time in a manner similar to the linearized 
solutions, however, at low Njj there was a narrow region of 
parameter space where a mixed solution was optimal. At sampling 
levels where the host sample size, sh, was ~20% or more of the 
total host population size, Nh, the optimal solution, s*, depended 
on the infected host population mixed with either recovered hosts 
(1 jy < 8) or infected vectors (1 jy > 8) (Figure 4). The critical value 
of l/y = 8 defined where the solution switches between recovered 
hosts and infected vectors consistent with predictions from the 
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Figure 4. The optimal sampling strategy as a function of vector population abundance, N H and the time to recovery, 1/y. The pure 
magenta region corresponds to an optimal sampling design of (s* v = 8S,s* H =0,s* R = 0), where the vector sample size is denoted as sy, the sample size 
for testing infected hosts is denoted by sh, and the sample size for testing recovered hosts is denoted by sr. The pure cyan region corresponds to an 
optimal sampling design of (s* v = 0, s* H = 0, s* R = 88). These two regions are separated at recovery times of 1 /y = 8. Regions with yellow correspond to 
mixed solutions of the form (s* v = 0,s* H >0,s R >0) when l/y<8, or (s* v = > ,s* H = 0,s R >O) when l/y>8. 
doi:1 0.1 371 /journal.pcbi.1 003668.g004 



linearized model. There were no values of S* where all three 
classes were to be sampled. Overall, the linearized results provided 
a useful guide to the optimal sampling design except for the 
narrow region of parameter space where mixed sampling designs 
were found to be optimal at low Nfj (Figure 4). 

Our results from the linearized binomial sampling models for 
both the SI and SIR dynamics appeared robust to several of our 
assumptions as the effects of both the linearized dynamics and 
choice of sampling model had little effect on the optimal sampling 
design for West Nile virus. However, our results were sensitive to 
low host population sizes where the nonlinear models suggest that 
a mixed sampling design that incorporates both vectors and 
infected hosts will be optimal when it is possible to sample a 
significant proportion of the host population. In cases where the 
host population is sufficiently large, our analysis recommends 
placing sampling effort into infected vector populations given that 
current evidence suggests either long recovery times or persistent 
infections for West Nile virus in bird populations [30,31]. When 
the sampling size is 20% or more of the total population size then 
more detailed models should be explored such as the full SI and 
SIR models. 

We tested the sensitivity of the time to detection on the initial 
conditions by simulating the optimal decision of an outbreak with 
initial conditions (7 V (0) = 0,/j/(0) = l,R H (0) = 0) when the out- 
break actually occurred with initial conditions 
(I„(O)=l,I H (O) = 0,R H (O) = O) for i=l to 30 and N H = \0 3 
and 10 4 . We found that when the recovery time, ~ >8, and Nh 
was high the optimal decisions between the two initial conditions 
were consistent and there was no error (Figure 5). However when 
Nh was low, or the recovery time was less than 8 days the optimal 



decisions differed strongly between initial conditions. This led to 
significant error in the expected time to detection, on the order of 
5 to 6 weeks for low recovery times and 1-3 weeks for the low 
population sizes and high recovery times (Figure 5). 

Discussion 

Active surveillance is an important tool for decision makers; 
treating the process analytically can provide some important 
insights on how to conduct cost-efficient surveillance. Very little 
past work in mathematical epidemiology has focused on early 
detection despite these potential benefits. One of the important 
products of this analysis has been to explicitly define the kinds of 
data that will be needed to design basic surveillance studies. 
Specific knowledge about the costs associated with sampling 
different populations and information about disease transmission 
rates will be necessary when making very specific predictions, but, 
as we have shown, applying the procedure with only basic 
knowledge of these quantities can make predictions that may be 
robust. This is fortunate for monitoring agencies as ecological and 
epidemiological parameters can be difficult and costly to obtain. 
Our analyses of West Nile virus illustrates robustness to 
parameters that are often unknown over a variety of models and 
assumptions. 

Although we focused on basic SI and SIR models, this 
framework can be easily extended to include more specific models 
when they are available. It is likely that West Nile virus models 
that incorporate more biological realism (e.g., [32]) will be 
necessary to provide more targeted advice concerning surveillance 
practices for specific management agencies, whose monitoring 
capabilities may differ from what was assumed in this work. 
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Figure 5. The sensitivity of the expected time to detection on 
initial conditions. We calculated the deviation from the minimal time 
to detection when we assume that an epidemic starts in the host 
population and the the number of initially infected vectors is zero 
(/k(0) = 0,///(0)= 1), but the initial conditions of the outbreak are 
actually (I v (0)= l,///(0) = 0). We did this for two host sample sizes of 
Nh = 10 3 and Nh = 10 4 , and over a range of recovery times (1/y) from 1 
to 30 days. 

doi:1 0.1 371/journal.pcbi.l 003668.g005 

However, because the timescales we are examining are relatively 
short, our models may provide robust predictions when sampling 
is conducted over limited spatial scales. Therefore, even the 
simplified models examined here may be useful for designing 
sampling strategies when more detailed ecological and epidemi- 
ological information is not available. 

Our results suggest that the optimal sampling design will often 
focus all sampling effort on a particular species or compartment. 
This result is due to the linear nature of the cost functions and the 
approximately linear nature of the dynamical systems as functions of 
our control variable, the sample sizes, S. These on-off or "bang- 
bang" types of solutions arise in other epidemiological problems 
when determining how to treat or remove individuals in infected 
populations to stop an epidemic [33-35]. More recent work on the 
control of epidemics suggests that when considering multiple control 
strategies the optimal solution is not simply an additive combination 
of the independent control solutions [36]. Similar results may hold 
for surveillance methods when combining different types of 
surveillance strategies, for example active and passive sampling 
strategies. In cases where linearity and large population approxi- 
mations for the dynamics do not hold, our analysis suggests that the 
optimal sampling design can be a mixture of sampling strata but this 
occurs over a very limited parameter space for West Nile virus 
(Figure 4). Nonlinear cost functions may also arise when the cost per 
sample changes when performing a large number of samples due to 
reductions in the associated personnel costs or in the laboratory fees 
incurred in performing a large number of tests. Changing the 
dynamical model by incorporating more detailed ecological and 
epidemiological considerations may also reduce the robustness of 
our linearization approximation. For example, introducing spatial 
structured populations [37] or heterogeneous contact rates are 
known to lead to additional nonlinearities in incidence functions 
[38,39]. 



There are several additional considerations that may improve 
upon our efforts. Many disease models include exposed compart- 
ments (e.g., malaria [40]) in the host and/ or vector population that 
can delay the onset of infectiousness once bitten. This may lead to 
additional possibilities in the switch time analysis that we did not 
consider. For example, if a host population is initially infected but 
has a long exposed period then there may be a quick switch to 
sampling the vector population followed by switches at longer time 
scales back to the host population. Additional important develop- 
ments include treating the initial conditions and transmission 
process as random variables. This will likely lead to a distribution 
of optimal strategies rather than a single, fixed strategy [41]. 
Recognizing uncertainty in the initial conditions may be especially 
important when the source of infection is unclear given the 
potential sensitivity of the sampling process to the initial infections. 
We also did not consider the possibility of testing for multiple 
pathogens in this analysis. For example in Florida, mosquito 
control agencies regularly screen for malaria, West Nile virus, and 
dengue fever among others [12]. Applying a mixed sampling 
strategy may allow managers to hedge their bets because the 
optimal strategy for West Nile virus may not necessarily translate 
to the early detection of other pathogens. Finally, our assumption 
that diagnostic tests for pathogen or antibodies provide perfect 
indicators of an individual's state may be violated by several 
factors. First, immunological dynamics can lead to low viral or 
antibody levels even when individuals have been infected, which 
may lead to low test reliability [29]. Extending the approach to 
coupled immunological-epidemiological models may account for 
this source of uncertainty. Second, and perhaps more importantly, 
imperfect diagnostic test reliability can arise due to stochastic 
factors that cannot be accounted for in conventional lab 
techniques. These effects can be incorporated into a sampling 
model by multiplying the economic efficiency by a random 
variable representing the test sensitivity and specificity [42]. 

Despite the recognized impact of emerging zoonoses on human 
health [43] we are aware of no work that attempts to integrate the 
active surveillance systems explored here with disease surveillance 
in humans. In diseases where humans are spillover hosts, such as 
West Nile virus, low human incidence is expected. Passive 
surveillance is often more economically efficient when dealing 
with rare events [44] but this reporting process differs from the 
assumptions made in this work. In passive surveillence the 
reporting effort will often vary through time due to seasonal and 
institutional effects. Incorporating these factors into a predictive 
framework will require the statistical analyses of these patterns 
[11]. When including the surveillance of humans for West Nile 
virus we expect that reductions in the time to detection will occur 
when the recovery rate (y) is high or the human population size is 
low relative the the vector population, as this is when hosts are 
most efficient to sample for detecting the disease (Figure 4), though 
the particular effects will depend on the amount of sampling effort 
and the transmission rate to humans from the vector. In general 
we expect that accounting for passive human surveillance of 
zoonoses may change the optimal active surveillance strategy for 
wildlife populations as it may not be necessary to sample hosts that 
have strong interactions with humans or species that significandy 
lag behind the epidemiological response of humans. 

Another important case that we did not consider here are 
zoonotic diseases such as avian influenza, which spread much 
more easily within one zoologic species than across-species. For 
these diseases, the goal of surveillance is to detect a subtype of the 
disease more virulent in humans, indicated by sustained human to 
human transmission. This sampling needs to be tailored to detect 
clusters of human cases linked to a single avian-to-human 
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transmission that deviate from what is to be expected from low- 
level human-to-human and bird-to-human transmission [45] . This 
kind of surveillance will require more detailed contact tracing that 
is not accounted for in our framework, though the basic structure 
we have described here could still be applied. More complex 
statistical analyses will also be needed to determine whether levels 
of infecteds and recovereds are significantly higher than 
background levels in order to determine if an outbreak is 
occurring. Analyses such as those determining epidemic thresholds 
from public health data (e.g. [45-47]) will be useful starting points 
for integrating thresholds into detecting epidemics of endemic 
zoonotic diseases. 
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[0,1] x [0,1]. 

(TIFF) 

Figure S2 Possible solutions (V,H) of (SI 9). The red 

dashed line is the line V = H. 
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